Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding
نویسندگان
چکیده
Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.
منابع مشابه
Generation of physical map contig-specific sequences
Rapid advances of the next-generation sequencing technologies have allowed whole genome sequencing of many species. However, with the current sequencing technologies, the whole genome sequence assemblies often fall in short in one of the four quality measurements: accuracy, contiguity, connectivity, and completeness. In particular, small-sized contigs and scaffolds limit the applicability of wh...
متن کاملScaffolding low quality genomes using orthologous protein sequences
MOTIVATION The ready availability of next-generation sequencing has led to a situation where it is easy to produce very fragmentary genome assemblies. We present a pipeline, SWiPS (Scaffolding With Protein Sequences), that uses orthologous proteins to improve low quality genome assemblies. The protein sequences are used as guides to scaffold existing contigs, while simultaneously allowing the g...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملBioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polypl...
متن کاملWhole-genome profiling and shotgun sequencing delivers an anchored, gene-decorated, physical map assembly of bread wheat chromosome 6A
Bread wheat (Triticum aestivum L.) is the most important staple food crop for 35% of the world's population. International efforts are underway to facilitate an increase in wheat production, of which the International Wheat Genome Sequencing Consortium (IWGSC) plays an important role. As part of this effort, we have developed a sequence-based physical map of wheat chromosome 6A using whole-geno...
متن کامل